00:02
2026-06-29
ianbarber.blog
large-language-models
Itβs always the learning rates
Scaling laws predict training loss as model size, dataset size, and compute scale, but their practical application is sensitive to hyperparameter choices like learning rate. Lilian Weng's post highligβ¦